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Abstract 



This report proposes an empirical Bayes approach to the problem of equating scores on test 
forms taken by very small numbers of test takers. The equated score is estimated separately at 
each score point, making it unnecessary to model either the score distribution or the equating 
transformation. Prior information comes from equatings of other test forms, with an appropriate 
adjustment for possible differences in test length. 
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The Problem 



Often, a new form of a test is taken at its first administration by a small number of test 
takers. For some tests, the number is usually smaller than 100, frequently smaller than 50, and 
sometimes smaller than 20. Equating scores on the basis of such a small sample of test takers is 
likely to produce a poor estimate of the equating relationship in the target population. Yet, the 
statisticians responsible for determining the raw-to-scale score conversion need an estimate of 
the equating relationship in time to report scores for that small group of test takers. That estimate 
needs to be the best estimate of the equating relationship in the target population that can be 
obtained with the information available. Smoothing the score distributions before equating tends 
to improve the accuracy of the equating (see, for example, Livingston, 1993), but if the samples 
are very small, the score distributions in the samples may differ from the distributions in the 
population in ways that smoothing will not correct. 

One possibility for improving the stability and accuracy of an estimate is to incorporate 
collateral information into the estimation process. The use of collateral information to improve 
the accuracy of an estimate in educational testing dates back at least to Kelley (1947, quoted in 
Lord & Novick, 1968, p. 65), who proposed using information from a group of test takers in the 
estimation of the true score of an individual member of the group. More recently, statisticians 
have used collateral information to improve the accuracy of predictions based on test scores 
(Rubin, 1980), including at least one application involving very small samples (Braun, Jones, 
Rubin, & Thayer, 1983). However, we are not aware of any previous attempt to improve the 
form-to-form equating of test scores by using collateral information from the equating of other 
test forms. Those test forms can be previous forms of the same test or, alternatively, forms of 
other tests that are (in ways that are important to the equating process) similar to the test form to 
be equated. The important similarities might include the type of items, the extent to which the 
items are interdependent, the approximate length of the test, and (for anchor equating designs) 
the characteristics of the equating anchor. 

Relevant prior information is often available. The question is how to incorporate it into 
the process of estimating the equating relationship. A practical procedure need not be 
theoretically optimal, but it must improve the accuracy of the estimate — if not in every case, then 
at least often enough to justify its use. It must be capable of being implemented fairly easily. And 
it should be capable of estimating a curvilinear equating relationship. Two unequally difficult 
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test forms, administered to the same population, tend to produce differently skewed distributions 
of scores, resulting in a curvilinear equating relationship. An equating method that requires the 
equating transformation to be linear is likely to be inaccurate in this common situation. 

Point by Point 

Incorporating collateral information into an estimate of the equating transformation is a 
complex problem. The approach proposed here is to simplify the problem by reducing it to a 
series of replications of a much simpler task — estimating the equated score that corresponds to a 
single possible score on the new form. If this task can be accomplished for every possible raw 
score on the new form, the result will be an estimate of the equating transformation. 1 (If the 
resulting estimate of the equating function is not smooth, the estimate can be improved by 
smoothing it. 2 ) The point-by-point approach to equating is not a new concept; it is actually the 
basis for equipercentile equating (Angoff, 1984, pp. 97-101; Kolen & Brennan, 2004, pp. 36- 
46). In this respect, it differs not only from linear equating but also from kernel equating (von 
Davier, Holland, & Thayer, 2004). 3 

Given a possible raw score x on the new form, the corresponding equated score on the 
reference form can be estimated by a weighted average of a current value and a prior value. The 
current value is the equated score from the current equating, denoted here as y current ■ The prior 

value is y prior . The weight of y current should depend on the confidence it is appropriate to place 

in the results of the current equating. The larger the sample, the more stable those results are 
likely to be, and the more weight it is appropriate to give to y current . The weight of y prior should 

depend on how many prior equatings contribute to it, on the stability of the information those 
equatings provide, and on the extent to which the information they provide is consistent. All of 
these factors affect the stability and the usefulness of y prior as a prior estimate. The more stable 

the prior equatings are and the more consistent they are with each other, the more weight it is 
appropriate to give to y prior . 

A solution that has these desired properties is the empirical Bayes (EB) estimate 
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with appropriate estimates for the variances. 

To adopt this approach is to treat the current equating as if it had been sampled at random 
from a large domain of possible equatings, each with its own new form, reference form, and 
samples of test takers. For each combination of new form and reference form in this domain, 
there is a true equated score y — the equated score that would result from averaging over many 
replications of the equating procedure, each with a different pair of samples of test takers. The 
current equating is a member of that domain. Therefore, its value of y will enter into any 
estimates of quantities that refer to the entire domain of possible equatings. 

If there have been many previous forms of the test that is being equated, the domain of 
possible equatings can be restricted to include only forms of that test. Often, however, there will 
be few previous forms to include — possibly none at all. It will be necessary to specify the 
domain of possible equatings so as to include other tests that are similar to the test to be equated. 
In this case, using collateral information from other tests can provide a better estimate of the 
heterogeneity of the domain. 

To obtain a value for y prior and estimate its variance, consider the domain of all possible 

equatings that are relevant to the current equating (including the current equating itself). In each 
of these possible equatings, for any given new-form score x, there is a corresponding reference- 
form score y. Those y values form a distribution, and that distribution has a mean and a variance. 
If the mean of this distribution were known, it could be used as the prior value. This distribution 
cannot be observed, but a small sample of it can be — the values of y that correspond to x in the 
prior equatings identified as relevant. The mean of the values that can be observed is an estimate 
of the mean of the entire distribution, and that mean, denoted here as y , can be used as the prior 

value. For an estimate of var( y current ) , use the square of the conditional standard error of 
equating, as estimated by a procedure appropriate for the equating method used. For an estimate 
of var (y prior ), use the following expression, 4 if its value is greater than 0 (and 0 if it is not): 
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In this notation, i indexes the equatings, with i = m for the current equating and i = 1 to 
m - 1 for the prior equatings used as collateral information. The observed y value in equating i is 
denoted as y t , and the mean of the y i values is denoted as y . The term a; indicates an estimate 

of the square of the conditional standard error of the z th equating, at the relevant score level. 
(Often an estimate is available from the equating software.) 

If the conditional standard error of equating at a particular score level is large (as it often 
is, in the tails of the score distribution), the second term of Equation 2 may well be larger than 
the first term. In that case, the resulting estimate of 0 for the prior variance will cause Equation 1 
to simplify to y EB = y prior . The EB estimate of the equated score will be the mean of the values 

observed in all the relevant equatings. This estimate includes the current equating but gives it no 
more weight than any of the others. 



A Complication 

One complication that users of this approach are likely to encounter is that the test forms 
in the domain of equatings can differ in length. Even if the domain is limited to forms of a single 
test, some forms may have one or more items excluded from scoring (because of problems with 
content, printing errors, etc.). Therefore, a preliminary step before applying the proposed 
procedure is to transform, into percentage terms, the scores on the new form and the reference 
form, in the current equating and in all the equatings to be included as collateral information. 
This transformation consists of subtracting the lowest possible score, dividing by the range of 
possible scores, and multiplying by 100. One consequence of this transformation will be that 
forms that differ in length will have different sets of possible scores. For example, a score of 
50% is possible on a form with an even number of items but not on a form with an odd number 
of items. If the new form in the current equating has an even number of items, one of the raw 
scores to be equated will be 50%. But if in one of the prior equatings the new form has an odd 
number of items, the table of equated scores for that prior equating will not include a new-form 
raw score of 50%. It will be necessary to interpolate. 
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The Proposed Procedure 

The sequence of steps in the proposed procedure is as follows, for a particular raw score 
x* on the new form in the current equating: 



1. Transform x* to xpcU, where xpct, =100 



x,, — x„ 



V^max **min ) 



2. In the current equating, find the equated raw score y current corresponding to x» . 
Transform y current to ypct current and label it ypct m for use in the steps that follow.. 



3. In the first prior equating, find x, , the new-form raw-score value for that equating for 
which xpct x = xpct,. . In many cases, X, will not be a possible raw score, and it will be 
necessary to interpolate between two possible new-form raw scores. These possible 
scores can be denoted as x 1+ and X, , chosen so that xpct u and xpct x _ are the nearest 

values of xpct l above and below xpct, . 

4. Repeat Step 3 for the prior equatings, to determine ypct 2 , ypct 3 , and so on. 

5. Compute ypct jor = — V ypct l , the mean, over the current and prior equatings, of 

m r-f 

the ypct value corresponding to x» . 



6. Use Equation 2 to estimate var (ypct prior ) from the values of vpct l , ypct 2 , ypct 3 , 

and so on, and the estimates of the conditional standard error of equating. Note that 
these standard errors will need to be interpolated and rescaled to match the units of 
the corresponding ypcl j values. 

7. Use Equation 1 to estimate ypct EB . 

8. Transform ypct EB back to the score scale of the reference form in the current 
equating. 

If the function defined by the successive values of ypct EB for increasing values of x, is 
not smooth, apply a smoothing procedure. 
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Specifying the Domain of Collateral Information 

An important question in implementing this procedure is how broadly to define the 
domain of prior equatings to be used as collateral information. Should they be limited to previous 
forms of the test to be equated? If equatings of forms of other tests are to be included, in what 
ways must those tests be similar to the test to be equated? This question is one that can best be 
answered empirically, and the answer is likely to depend on the set of tests being considered. At 
a selected percent-correct score level on the new form (e.g., 60%), record the equated scores in 
all the prior equatings of test forms being considered for inclusion in the domain — possibly 
several forms of each test. Compare the variation between tests with the variation between 
different forms of the same test. Repeat this procedure at several selected score levels. If the 
results do not indicate a systematic difference between tests, it seems reasonable to conclude that 
prior equatings of other tests will provide useful collateral information. 

In deciding which tests to consider for possible inclusion in the domain of collateral 
information, the most important factor would be the extent to which the test forms tend to differ 
in difficulty. Some tests are constructed from items that have been thoroughly pretested on 
representative samples of the test-taker population. Forms of those tests are likely to show only 
small differences in difficulty. Other tests are constructed from items for which no pretest 
information is available. Forms of those tests are likely to differ much more in difficulty. 

Another important question is whether the domain of collateral information should 
include both possible equatings of each pair of forms (i.e., the equating of X to Y and the 
equating of Y to A). If both are included, the differences in difficulty will tend to cancel each 
other out in the estimation of y prior in Equation 1. The prior estimate of the equating 

transformation will be very close to the identity, and the value of y will be very close to that of x, 
when both x and y are expressed as percentages. One way to think about this question is to ask, If 
in the past, the new forms in the domain have tended to be more difficult (or, alternatively, 
easier) than the reference forms they were equated to, does that tendency represent a genuine 
trend? If so, is it realistic to expect this trend to continue in the future? If the answer to either of 
these questions is no, it may be best to use the identity as the prior estimate. In that case, the 
main function of the collateral information will be to estimate var(y . or ) . 
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Advantages and Limitations 

Repeating the proposed procedure for any given score on the new form to be equated will 
produce an estimate of the equated score that would result from equating in a very large sample 
from the target population. This estimate should be, in most cases, a better estimate than the 
equated score implied by the current small-sample equating. It should also be, in most cases, a 
better estimate than could be obtained by disregarding the current equating and using, instead, 
the score implied by a previous equating or by the average of several previous equatings. 

Because the proposed procedure estimates the equated score separately for each raw- 
score value, it does not require constraints on the form of the equating transformation. In 
particular, it is not constrained to produce an equating transformation with a particular slope, or 
even a transformation with a constant slope. It can use collateral information from equatings 
computed by different methods, even if some of those equatings were constrained to be linear 
and others were not. 

One limitation of the proposed procedure, from a theoretical point of view, is that it is not 
symmetric with respect to the new form and reference form. Therefore, it is not, strictly 
speaking, an equating procedure. Instead, it is an estimation procedure — a procedure for 
estimating the results that would be obtained if the current equating could be performed with 
data from the full target population. Even though the function to be estimated is symmetric in X 
and y, the best available estimate of it from small-sample data may not be symmetric in X and Y. 

A more important limitation, from a practical point of view, is the difficulty of estimating 
var( y curr e nt ) ' n Equation 1. The commonly used formulas may not yield accurate results when 
used with small-sample data. Resampling studies may be necessary to develop a prediction 
formula for var( y current ) as a function of sample size, so as not to require any parameters to be 
estimated from the small-sample data. 

The greatest drawback to the proposed procedure is that there are situations in which it 
can produce a result that is less accurate than the current equating. If there is reason to believe, a 
priori, that the current form differs in difficulty from its reference form in a way that the new 
forms in prior equatings did not, it would be unwise to use the procedure presented here. Such a 
situation could arise if the new form were deliberately constructed to be easier or harder than 
previous forms of the same test. It also could arise if the new form were being equated to a 
reference form known to be unusually easy or unusually difficult. 
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In conducting resampling studies to evaluate the proposed procedure, the authors’ 
colleague Sooyeon Kim encountered a situation in which several prior forms of a test were 
similar in difficulty to the reference forms they were equated to, but one subsequent form was 
much harder than its reference form. The close agreement among the prior equatings gave the 
collateral information a heavy weight in the EB estimate, pulling the estimate toward an 
incorrect value (Kim, Livingston, & Lewis, 2008). The important practical question is. Which is 
the greater danger — being misled by the collateral information or being misled by an anomalous 
small-sample equating result? Some previous writers have taken an extreme point of view, 
suggesting that when the available samples for equating are smaller than a specified size, the 
right thing to do is to disregard the data entirely and assume the new form and reference form to 
be of equal difficulty at all score levels (Kolen & Brennan, 2004, pp. 289-290; Skaggs, 2005, 
p. 309). But there is an alternative that seems preferable: using available data to indicate how 
much weight to give to the small-sample equating results and how much to another estimate of 
the equating transformation. 
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Notes 



1 If the number of possible scores on the new form is very large, it may be necessary to select a 
subset of the possible raw scores, compute the equated scores for those selected raw scores, 
and interpolate for the raw scores not selected. 

“ See, for example, the method described by Kolen (1984). This technique is typically referred to 
as postsmoothing to distinguish it from the presmoothing of the score distributions before 
equating. 

It is also quite different from the approach taken by van der Linden (2006) in which “ ... unlike 
equipercentile equating, ... there is a different function for each test taker” (p. 359). 

4 The derivation is shown in the appendix. 
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Appendix 

Derivation of Estimates for y prior and var (y prior) 

Let z index the equatings to be used in estimating y prjor and var {y ior ) • from 1 to m. Each 

of these equatings has its own new form and its own reference form. 

Given a specified raw-score value x on the new form in equating z, let v, represent the 

corresponding equated score. That is, y f is the score on the reference form in equating i that 

corresponds to the specified raw score x on the new form. 

Suppose it were possible to repeat, infinitely many times, the process of sampling test 
takers to take the new form in equating z, sampling test takers to take the reference form in 
equating z, and performing the equating of the new form to the reference form in those samples 
to observe a value of y t . If the resulting values could then be averaged over all those replications 
of equating z, there would be a quantity that could reasonably be considered the true equated 
score for equating z, at new-form score-level x. Let that quantity be denoted by a j = avg( y ( . 1 z) , 

using avg to mean the average over infinitely many replications of equating z. Let e i = y i -a i 
represent the amount by which a particular observed value of y i is higher or lower than the 
average value cl . Then avg ( e I z) = 0 and, because the replications of equatings i and z" are 
independent, for any two different equatings z and z" , avg (<?,<?, |z',z") = 0 . 

Let erf represent the variance of y t over the many replications of the sampling and 

equating procedure of equating z (in other words, the square of the conditional standard error of 
equating z at new-form raw score x). 

Since cl does not vary over the replication process, var ( cl I / ) = 0 and cov ( cl , cl I z) = 0 . 
Therefore, avg (ef I ij = var ( e I z) = var ( y, I z) = of and, if z = z' , avg (cry |z',z"j = avg (e; |z) = a] . 

Let y = — V y, , the average y- value over the m equatings indexed by z . This quantity 

I" i i 

will be the value for y prior in Equation 1. 
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Let s] = ^ ( a t - a ) . This is the quantity to be estimated, to provide a value for 

m — 1 ,=i 

^(y prior) in Equation 1. 

1 m ^ 

One plausible estimate of s 2 is V ( y, - y) . However, this estimate may require a 

correction for bias. To determine the correction needed, first rewrite the sum of squares: 

m m _ 

EU-y) =Z[(y.-- a i) + ( fl / ~a)+(a -y)]“ 

i = 1 i=l 



= Z(y.- -a .-)“ + Z( a i -a ) +J L( a -y) 

i = 1 i=l i=l 

m m m 

+ 2 E(y.- -«) +2 («-y)Z(^- fl .-) +2 («-y)Z( fl .--«> 



i=l 



i=l 



i = 1 



Now y, - a,. = c for all i, and therefore y -a = e . 
Consequently, the above expression can be rewritten as 



m m 



i = 1 i=l 

m m m 

+ 2 Ts( e ')( a i-“) + 2 (-e)Tj( e i) + 2 (-e)Tj( a i-“)- 



Since 



1 



e = — V e i , this expression can be rewritten again as 

m 1=1 



m m 



Z e ? + Z ( a i - a ) 2 + m — Z c 1 

V m 1=1 J 



i = 1 i=l 






i = 1 



| m \ m / | m \ m 

+2 S( e .- 2 1 — Z( g /) -2 — Zc 



V W ,=1 ) i=l 



V m i=i y i=i 



and again, as 
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i=l 1=1 



( rn \ 



m t m m 1 

+2 Z { e i){ a i - a) - 2 Z Z ( e i e r ) - 2 I Z e ; |Z( a i - ^)- 

m 1=1 i=i m V i=i / i=i 



i=l 



Now the average over replications for each of the m equatings can be taken. (Note that a t 
does not vary over replications of equating /.) This average is: 



m m i m m 

Z avg (ef \i ) + Z ( a, - a f + — £ £ avg (e t e v \i„ i ’) 

i=i i=i m i=i i"=i 

m 1 m m 1 f m A m 

+ 2 Z avg (e ; |ij(fl,.-fl)-2-ZZ avg (e,.e r \i,i ’) - 2— £ avg (<?,. |i) £ (a,. -a). 



i=i 



m i=1 r=1 



m 



V i=i 



i=i 



To simplify this expression further, use the following results developed earlier: 
avg(e |/) = 0, avg(e 2 1?) = of , avg(e e , = 0 for , and avg(ee = avg(V |z) = a] for 

i = i ' . 

These results make the expression equal to 



m m 



t m 

Z^ + Z( fl *- - *) 2+— IX 

m (=1 



i = 1 i = 1 



/ m X 



m | m t 

+ 2 Z°-(«,-«)- 2 — Z^f- 2 — I z° |Z(«/-«) 

i=l m 1=1 m V 1=1 J i=i 



1 - 1 



m m 



1 + — -2— |Z^+Z ( fl .-- fl ) 2 

V >n m J i=l i=l 



m - 1 



m m 



v tn j i=i i=i 



Z a . 2+ Z { a i~ a ) ■ 



This quantity is the expectation of £(y ; - ?) over replications of all the m equatings. 

1 



1=1 



Therefore, the expectation of £( y, - v ) 

m — 1 ;=1 



2 . 
IS 
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1 m 1 m 

— E°r + rZ («,-«) 2 

m ,- =1 i n - 1 , = | 



Suppose there are unbiased estimates erf for each of the erf . Then, over replications of 
all the m equatings (/ = 1 to m ), the expected value of 



1 i= 1 m i=l 



is equal to 



1 i=l 



2 = s . 2 . 



Thus an unbiased estimate for ,v 2 is 



m i m 

1 /=! m i = 1 
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